Práctica 1: Análisis Exploratorio de Datos (EDA)¶

Introducción¶

En el sector bancario, garantizar decisiones acertadas en la evaluación de solicitudes de préstamos es trascendental para minimizar riesgos y maximizar el beneficio. En esta práctica, se examinará un conjunto de datos relacionado con solicitudes de préstamos, empleando técnicas de Análisis Exploratorio de Datos (EDA) vistas en clase.¶

Objetivos¶

- Identificar y determinar patrones en los datos, los cuales indiquen la capacidad de los solicitantes para cumplir con sus obligaciones financieras. Ratificando que los usuarios capaces de saldar el préstamo no sean rechazados, a la par de detectar perfiles con dificultades para cubrir la deuda.¶

- Responder a la pregunta clave: ¿Hay algún tipo de clientes más propenso a no devolver un préstamo?, guiando al banco en la toma de decisiones para mitigar exposiciones.¶

Desarrollo notebook 1¶

Importar librerías¶

In [1]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px

import sys
sys.path.append('/Users/miguelflores/Desktop/P1/practica1')
from funciones import funciones_auxiliares as f_aux

pd.set_option("display.max_rows", 10000)
pd.set_option("display.max_columns", 10000)
pd.set_option("display.width", 10000)

Carga y lectura de dataset¶

In [2]:
# Situamos a la variable 'SK_ID_CURR' como índice, con el fin de acceder fácilmente a los datos usando estos identificadores como claves.
df = pd.read_csv("/Users/miguelflores/Desktop/df_practica1.csv").set_index("SK_ID_CURR")
Es importante señalar que, en este caso, se está empleando una ruta absoluta debido a que el tamaño de la base de datos es tan grande que GitHub no puede soportarlo. En situaciones contrarias, donde el tamaño del archivo no es un impedimiento, se debe leer el fichero de datos desde una ruta relativa que apunte al repositorio de GitHub.¶
In [3]:
df.head()
Out[3]:
TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY AMT_GOODS_PRICE NAME_TYPE_SUITE NAME_INCOME_TYPE NAME_EDUCATION_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH OWN_CAR_AGE FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE FLAG_PHONE FLAG_EMAIL OCCUPATION_TYPE CNT_FAM_MEMBERS REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY WEEKDAY_APPR_PROCESS_START HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION REG_REGION_NOT_WORK_REGION LIVE_REGION_NOT_WORK_REGION REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY ORGANIZATION_TYPE EXT_SOURCE_1 EXT_SOURCE_2 EXT_SOURCE_3 APARTMENTS_AVG BASEMENTAREA_AVG YEARS_BEGINEXPLUATATION_AVG YEARS_BUILD_AVG COMMONAREA_AVG ELEVATORS_AVG ENTRANCES_AVG FLOORSMAX_AVG FLOORSMIN_AVG LANDAREA_AVG LIVINGAPARTMENTS_AVG LIVINGAREA_AVG NONLIVINGAPARTMENTS_AVG NONLIVINGAREA_AVG APARTMENTS_MODE BASEMENTAREA_MODE YEARS_BEGINEXPLUATATION_MODE YEARS_BUILD_MODE COMMONAREA_MODE ELEVATORS_MODE ENTRANCES_MODE FLOORSMAX_MODE FLOORSMIN_MODE LANDAREA_MODE LIVINGAPARTMENTS_MODE LIVINGAREA_MODE NONLIVINGAPARTMENTS_MODE NONLIVINGAREA_MODE APARTMENTS_MEDI BASEMENTAREA_MEDI YEARS_BEGINEXPLUATATION_MEDI YEARS_BUILD_MEDI COMMONAREA_MEDI ELEVATORS_MEDI ENTRANCES_MEDI FLOORSMAX_MEDI FLOORSMIN_MEDI LANDAREA_MEDI LIVINGAPARTMENTS_MEDI LIVINGAREA_MEDI NONLIVINGAPARTMENTS_MEDI NONLIVINGAREA_MEDI FONDKAPREMONT_MODE HOUSETYPE_MODE TOTALAREA_MODE WALLSMATERIAL_MODE EMERGENCYSTATE_MODE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE DAYS_LAST_PHONE_CHANGE FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_6 FLAG_DOCUMENT_7 FLAG_DOCUMENT_8 FLAG_DOCUMENT_9 FLAG_DOCUMENT_10 FLAG_DOCUMENT_11 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_14 FLAG_DOCUMENT_15 FLAG_DOCUMENT_16 FLAG_DOCUMENT_17 FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR
SK_ID_CURR
100002 1 Cash loans M N Y 0 202500.0 406597.5 24700.5 351000.0 Unaccompanied Working Secondary / secondary special Single / not married House / apartment 0.018801 -9461 -637 -3648.0 -2120 NaN 1 1 0 1 1 0 Laborers 1.0 2 2 WEDNESDAY 10 0 0 0 0 0 0 Business Entity Type 3 0.083037 0.262949 0.139376 0.0247 0.0369 0.9722 0.6192 0.0143 0.00 0.0690 0.0833 0.1250 0.0369 0.0202 0.0190 0.0000 0.0000 0.0252 0.0383 0.9722 0.6341 0.0144 0.0000 0.0690 0.0833 0.1250 0.0377 0.022 0.0198 0.0 0.0 0.0250 0.0369 0.9722 0.6243 0.0144 0.00 0.0690 0.0833 0.1250 0.0375 0.0205 0.0193 0.0000 0.00 reg oper account block of flats 0.0149 Stone, brick No 2.0 2.0 2.0 2.0 -1134.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 1.0
100003 0 Cash loans F N N 0 270000.0 1293502.5 35698.5 1129500.0 Family State servant Higher education Married House / apartment 0.003541 -16765 -1188 -1186.0 -291 NaN 1 1 0 1 1 0 Core staff 2.0 1 1 MONDAY 11 0 0 0 0 0 0 School 0.311267 0.622246 NaN 0.0959 0.0529 0.9851 0.7960 0.0605 0.08 0.0345 0.2917 0.3333 0.0130 0.0773 0.0549 0.0039 0.0098 0.0924 0.0538 0.9851 0.8040 0.0497 0.0806 0.0345 0.2917 0.3333 0.0128 0.079 0.0554 0.0 0.0 0.0968 0.0529 0.9851 0.7987 0.0608 0.08 0.0345 0.2917 0.3333 0.0132 0.0787 0.0558 0.0039 0.01 reg oper account block of flats 0.0714 Block No 1.0 0.0 1.0 0.0 -828.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
100004 0 Revolving loans M Y Y 0 67500.0 135000.0 6750.0 135000.0 Unaccompanied Working Secondary / secondary special Single / not married House / apartment 0.010032 -19046 -225 -4260.0 -2531 26.0 1 1 1 1 1 0 Laborers 1.0 2 2 MONDAY 9 0 0 0 0 0 0 Government NaN 0.555912 0.729567 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 0.0 0.0 0.0 -815.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
100006 0 Cash loans F N Y 0 135000.0 312682.5 29686.5 297000.0 Unaccompanied Working Secondary / secondary special Civil marriage House / apartment 0.008019 -19005 -3039 -9833.0 -2437 NaN 1 1 0 1 0 0 Laborers 2.0 2 2 WEDNESDAY 17 0 0 0 0 0 0 Business Entity Type 3 NaN 0.650442 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 0.0 2.0 0.0 -617.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN NaN NaN
100007 0 Cash loans M N Y 0 121500.0 513000.0 21865.5 513000.0 Unaccompanied Working Secondary / secondary special Single / not married House / apartment 0.028663 -19932 -3038 -4311.0 -3458 NaN 1 1 0 1 0 0 Core staff 1.0 2 2 THURSDAY 11 0 0 0 0 1 1 Religion NaN 0.322738 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 0.0 0.0 0.0 -1106.0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0

Análisis general de la tabla¶

Dimensión¶

In [4]:
print(df.shape, df.drop_duplicates().shape)
(307511, 121) (307511, 121)
Con base a la anterior línea de código anterior, es posible determinar que en el DataFrame contiene 121 variables y 307,511 observaciones.¶
Tipos de datos y su respectivo contenido¶
Con la finalidad de mantener el código limpio y operar de manera eficiente, a partir de esta sección en adelante se emplearán funciones auxiliares. Un documento donde estan registradas las funciones, facilitando la reutilización de código. A continuación se presenta la función tipos_datos( ) con el fin de determinar el tipo de variable y visualizar su respectivo contenido.¶
In [5]:
f_aux.tipos_datos(df)
TARGET                         int64        Contenido: 1, 0
NAME_CONTRACT_TYPE             object       Contenido: Cash loans, Revolving loans
CODE_GENDER                    object       Contenido: M, F, XNA
FLAG_OWN_CAR                   object       Contenido: N, Y
FLAG_OWN_REALTY                object       Contenido: Y, N
CNT_CHILDREN                   int64        Contenido: 0, 1, 2, 3, 4, 7, 5, 6, 8, 9, 11, 12, 10, 19, 14
AMT_INCOME_TOTAL               float64      Contenido: Más de 30 valores
AMT_CREDIT                     float64      Contenido: Más de 30 valores
AMT_ANNUITY                    float64      Contenido: Más de 30 valores
AMT_GOODS_PRICE                float64      Contenido: Más de 30 valores
NAME_TYPE_SUITE                object       Contenido: Unaccompanied, Family, Spouse, partner, Children, Other_A, nan, Other_B, Group of people
NAME_INCOME_TYPE               object       Contenido: Working, State servant, Commercial associate, Pensioner, Unemployed, Student, Businessman, Maternity leave
NAME_EDUCATION_TYPE            object       Contenido: Secondary / secondary special, Higher education, Incomplete higher, Lower secondary, Academic degree
NAME_FAMILY_STATUS             object       Contenido: Single / not married, Married, Civil marriage, Widow, Separated, Unknown
NAME_HOUSING_TYPE              object       Contenido: House / apartment, Rented apartment, With parents, Municipal apartment, Office apartment, Co-op apartment
REGION_POPULATION_RELATIVE     float64      Contenido: Más de 30 valores
DAYS_BIRTH                     int64        Contenido: Más de 30 valores
DAYS_EMPLOYED                  int64        Contenido: Más de 30 valores
DAYS_REGISTRATION              float64      Contenido: Más de 30 valores
DAYS_ID_PUBLISH                int64        Contenido: Más de 30 valores
OWN_CAR_AGE                    float64      Contenido: Más de 30 valores
FLAG_MOBIL                     int64        Contenido: 1, 0
FLAG_EMP_PHONE                 int64        Contenido: 1, 0
FLAG_WORK_PHONE                int64        Contenido: 0, 1
FLAG_CONT_MOBILE               int64        Contenido: 1, 0
FLAG_PHONE                     int64        Contenido: 1, 0
FLAG_EMAIL                     int64        Contenido: 0, 1
OCCUPATION_TYPE                object       Contenido: Laborers, Core staff, Accountants, Managers, nan, Drivers, Sales staff, Cleaning staff, Cooking staff, Private service staff, Medicine staff, Security staff, High skill tech staff, Waiters/barmen staff, Low-skill Laborers, Realty agents, Secretaries, IT staff, HR staff
CNT_FAM_MEMBERS                float64      Contenido: 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 9.0, 7.0, 8.0, 10.0, 13.0, nan, 14.0, 12.0, 20.0, 15.0, 16.0, 11.0
REGION_RATING_CLIENT           int64        Contenido: 2, 1, 3
REGION_RATING_CLIENT_W_CITY    int64        Contenido: 2, 1, 3
WEEKDAY_APPR_PROCESS_START     object       Contenido: WEDNESDAY, MONDAY, THURSDAY, SUNDAY, SATURDAY, FRIDAY, TUESDAY
HOUR_APPR_PROCESS_START        int64        Contenido: 10, 11, 9, 17, 16, 14, 8, 15, 7, 13, 6, 12, 19, 3, 18, 21, 4, 5, 20, 22, 1, 2, 23, 0
REG_REGION_NOT_LIVE_REGION     int64        Contenido: 0, 1
REG_REGION_NOT_WORK_REGION     int64        Contenido: 0, 1
LIVE_REGION_NOT_WORK_REGION    int64        Contenido: 0, 1
REG_CITY_NOT_LIVE_CITY         int64        Contenido: 0, 1
REG_CITY_NOT_WORK_CITY         int64        Contenido: 0, 1
LIVE_CITY_NOT_WORK_CITY        int64        Contenido: 0, 1
ORGANIZATION_TYPE              object       Contenido: Más de 30 valores
EXT_SOURCE_1                   float64      Contenido: Más de 30 valores
EXT_SOURCE_2                   float64      Contenido: Más de 30 valores
EXT_SOURCE_3                   float64      Contenido: Más de 30 valores
APARTMENTS_AVG                 float64      Contenido: Más de 30 valores
BASEMENTAREA_AVG               float64      Contenido: Más de 30 valores
YEARS_BEGINEXPLUATATION_AVG    float64      Contenido: Más de 30 valores
YEARS_BUILD_AVG                float64      Contenido: Más de 30 valores
COMMONAREA_AVG                 float64      Contenido: Más de 30 valores
ELEVATORS_AVG                  float64      Contenido: Más de 30 valores
ENTRANCES_AVG                  float64      Contenido: Más de 30 valores
FLOORSMAX_AVG                  float64      Contenido: Más de 30 valores
FLOORSMIN_AVG                  float64      Contenido: Más de 30 valores
LANDAREA_AVG                   float64      Contenido: Más de 30 valores
LIVINGAPARTMENTS_AVG           float64      Contenido: Más de 30 valores
LIVINGAREA_AVG                 float64      Contenido: Más de 30 valores
NONLIVINGAPARTMENTS_AVG        float64      Contenido: Más de 30 valores
NONLIVINGAREA_AVG              float64      Contenido: Más de 30 valores
APARTMENTS_MODE                float64      Contenido: Más de 30 valores
BASEMENTAREA_MODE              float64      Contenido: Más de 30 valores
YEARS_BEGINEXPLUATATION_MODE   float64      Contenido: Más de 30 valores
YEARS_BUILD_MODE               float64      Contenido: Más de 30 valores
COMMONAREA_MODE                float64      Contenido: Más de 30 valores
ELEVATORS_MODE                 float64      Contenido: 0.0, 0.0806, nan, 0.1611, 0.4028, 0.1208, 0.282, 0.0403, 0.2417, 0.8862, 0.3222, 0.3625, 0.6848, 0.5639, 0.6042, 0.2014, 0.5236, 0.4431, 0.4834, 0.6445, 0.725, 1.0, 0.8459, 0.9667, 0.8056, 0.9264, 0.7653
ENTRANCES_MODE                 float64      Contenido: Más de 30 valores
FLOORSMAX_MODE                 float64      Contenido: 0.0833, 0.2917, nan, 0.1667, 0.3333, 0.6667, 0.375, 0.0417, 0.25, 0.4583, 0.2083, 0.125, 0.0, 0.5833, 0.625, 0.9167, 0.9583, 0.5417, 1.0, 0.4167, 0.875, 0.7083, 0.75, 0.5, 0.7917, 0.8333
FLOORSMIN_MODE                 float64      Contenido: 0.125, 0.3333, nan, 0.375, 0.7083, 0.0417, 0.2083, 0.4167, 0.2917, 0.0, 0.5, 0.625, 0.0833, 0.1667, 0.6667, 0.25, 0.5833, 1.0, 0.9583, 0.5417, 0.9167, 0.75, 0.8333, 0.4583, 0.7917, 0.875
LANDAREA_MODE                  float64      Contenido: Más de 30 valores
LIVINGAPARTMENTS_MODE          float64      Contenido: Más de 30 valores
LIVINGAREA_MODE                float64      Contenido: Más de 30 valores
NONLIVINGAPARTMENTS_MODE       float64      Contenido: Más de 30 valores
NONLIVINGAREA_MODE             float64      Contenido: Más de 30 valores
APARTMENTS_MEDI                float64      Contenido: Más de 30 valores
BASEMENTAREA_MEDI              float64      Contenido: Más de 30 valores
YEARS_BEGINEXPLUATATION_MEDI   float64      Contenido: Más de 30 valores
YEARS_BUILD_MEDI               float64      Contenido: Más de 30 valores
COMMONAREA_MEDI                float64      Contenido: Más de 30 valores
ELEVATORS_MEDI                 float64      Contenido: Más de 30 valores
ENTRANCES_MEDI                 float64      Contenido: Más de 30 valores
FLOORSMAX_MEDI                 float64      Contenido: Más de 30 valores
FLOORSMIN_MEDI                 float64      Contenido: Más de 30 valores
LANDAREA_MEDI                  float64      Contenido: Más de 30 valores
LIVINGAPARTMENTS_MEDI          float64      Contenido: Más de 30 valores
LIVINGAREA_MEDI                float64      Contenido: Más de 30 valores
NONLIVINGAPARTMENTS_MEDI       float64      Contenido: Más de 30 valores
NONLIVINGAREA_MEDI             float64      Contenido: Más de 30 valores
FONDKAPREMONT_MODE             object       Contenido: reg oper account, nan, org spec account, reg oper spec account, not specified
HOUSETYPE_MODE                 object       Contenido: block of flats, nan, terraced house, specific housing
TOTALAREA_MODE                 float64      Contenido: Más de 30 valores
WALLSMATERIAL_MODE             object       Contenido: Stone, brick, Block, nan, Panel, Mixed, Wooden, Others, Monolithic
EMERGENCYSTATE_MODE            object       Contenido: No, nan, Yes
OBS_30_CNT_SOCIAL_CIRCLE       float64      Contenido: Más de 30 valores
DEF_30_CNT_SOCIAL_CIRCLE       float64      Contenido: 2.0, 0.0, 1.0, nan, 3.0, 4.0, 5.0, 6.0, 7.0, 34.0, 8.0
OBS_60_CNT_SOCIAL_CIRCLE       float64      Contenido: Más de 30 valores
DEF_60_CNT_SOCIAL_CIRCLE       float64      Contenido: 2.0, 0.0, 1.0, nan, 3.0, 5.0, 4.0, 7.0, 24.0, 6.0
DAYS_LAST_PHONE_CHANGE         float64      Contenido: Más de 30 valores
FLAG_DOCUMENT_2                int64        Contenido: 0, 1
FLAG_DOCUMENT_3                int64        Contenido: 1, 0
FLAG_DOCUMENT_4                int64        Contenido: 0, 1
FLAG_DOCUMENT_5                int64        Contenido: 0, 1
FLAG_DOCUMENT_6                int64        Contenido: 0, 1
FLAG_DOCUMENT_7                int64        Contenido: 0, 1
FLAG_DOCUMENT_8                int64        Contenido: 0, 1
FLAG_DOCUMENT_9                int64        Contenido: 0, 1
FLAG_DOCUMENT_10               int64        Contenido: 0, 1
FLAG_DOCUMENT_11               int64        Contenido: 0, 1
FLAG_DOCUMENT_12               int64        Contenido: 0, 1
FLAG_DOCUMENT_13               int64        Contenido: 0, 1
FLAG_DOCUMENT_14               int64        Contenido: 0, 1
FLAG_DOCUMENT_15               int64        Contenido: 0, 1
FLAG_DOCUMENT_16               int64        Contenido: 0, 1
FLAG_DOCUMENT_17               int64        Contenido: 0, 1
FLAG_DOCUMENT_18               int64        Contenido: 0, 1
FLAG_DOCUMENT_19               int64        Contenido: 0, 1
FLAG_DOCUMENT_20               int64        Contenido: 0, 1
FLAG_DOCUMENT_21               int64        Contenido: 0, 1
AMT_REQ_CREDIT_BUREAU_HOUR     float64      Contenido: 0.0, nan, 1.0, 2.0, 3.0, 4.0
AMT_REQ_CREDIT_BUREAU_DAY      float64      Contenido: 0.0, nan, 1.0, 3.0, 2.0, 4.0, 5.0, 6.0, 9.0, 8.0
AMT_REQ_CREDIT_BUREAU_WEEK     float64      Contenido: 0.0, nan, 1.0, 3.0, 2.0, 4.0, 5.0, 6.0, 8.0, 7.0
AMT_REQ_CREDIT_BUREAU_MON      float64      Contenido: 0.0, nan, 1.0, 2.0, 6.0, 5.0, 3.0, 7.0, 9.0, 4.0, 11.0, 8.0, 16.0, 12.0, 14.0, 10.0, 13.0, 17.0, 24.0, 19.0, 15.0, 23.0, 18.0, 27.0, 22.0
AMT_REQ_CREDIT_BUREAU_QRT      float64      Contenido: 0.0, nan, 1.0, 2.0, 4.0, 3.0, 8.0, 5.0, 6.0, 7.0, 261.0, 19.0
AMT_REQ_CREDIT_BUREAU_YEAR     float64      Contenido: 1.0, 0.0, nan, 2.0, 4.0, 5.0, 3.0, 8.0, 6.0, 9.0, 7.0, 10.0, 11.0, 13.0, 16.0, 12.0, 25.0, 23.0, 15.0, 14.0, 22.0, 17.0, 19.0, 18.0, 21.0, 20.0

La primer variable resultante en esta iteración, es la variable objetivo (TARGET) la cual presenta valores 1 y 0, donde estas simbolizan:¶

1 - Cliente con dificultades de pago: Pago atrasado de más de X días en al menos una de las primeras Y cuotas del préstamo¶
0 - Cliente sin dificultades de pago¶

Factores a considerar sobre las variables a emplear en el modelo¶

Analizando las variables, determinamos la existencia de variables a futuro en el conjunto de datos, los cuales pueden llegar a afectar los resultados obtenidos con el modelo, debido a que estas variables en específico son basadas en datos historicos, por lo que en el momento de llamar al modelo, no se encontrarán disponibles.¶
- EXT_SOURCE_1¶
- EXT_SOURCE_2¶
- EXT_SOURCE_3¶
- OBS_30_CNT_SOCIAL_CIRCLE¶
- DEF_30_CNT_SOCIAL_CIRCLE¶
- OBS_60_CNT_SOCIAL_CIRCLE¶
- DEF_60_CNT_SOCIAL_CIRCLE¶
Las variables de EXT_SOURCE_1, EXT_SOURCE_2 y EXT_SOURCE_3, simbolizan un puntaje normalizado de una fuente de datos externa. Las variables de OBS_30_CNT_SOCIAL_CIRCLE, DEF_30_CNT_SOCIAL_CIRCLE, OBS_60_CNT_SOCIAL_CIRCLE y DEF_60_CNT_SOCIAL_CIRCLE son historiales de impago en el circulo cercano del cliente (30 o 60 'Day Past Due').¶
En este caso, no eliminaremos las columnas del DataFrame, ya que consideramos que el cliente te proporciona los datos directamente, sin necesidad de recurrir a fuentes externas o datos historicos para obtener la información.¶

Exploración variable objetivo¶

In [6]:
# Proporción de valores únicos en la variable objetivo TARGET (%)
df_proporcion = df['TARGET']\
        .value_counts(normalize=True)\
        .mul(100).rename("%").reset_index()

# Conteo de valores únicos en la variable
df_conteo = df['TARGET'].value_counts().reset_index()

# Combinamos los df generamos anteriormente
df_proporcion_conteo = pd.merge(df_proporcion, df_conteo, how='inner')
df_proporcion_conteo
Out[6]:
TARGET % count
0 0 91.927118 282686
1 1 8.072882 24825
In [7]:
# Graficar el diagrama de barras
fig = px.bar(df_proporcion_conteo, x = "TARGET", y = "%", 
             labels = {'TARGET': 'Target', '%': 'Porcentaje'},
             title = "Distribución de la variable objetivo")

# Mostrar los valores de cada tipo de TARGET
fig.update_traces(text = df_proporcion_conteo['%'].round(2), # Obtener dos decimales en la cifra
                  textposition = 'inside', 
                  texttemplate = '%{text}%',  # Muestra el valor como porcentaje
                  )
# Actualiza las propiedades del eje X, sustituyendo los valores 0 y 1 por los tipos de cliente
fig.update_xaxes(tickmode = 'array', tickvals = [0, 1], 
                 ticktext = ['Tipo 0: Cliente sin dificultades de pago', 'Tipo 1: Cliente con dificultades de pago'])
fig.update_yaxes(title_text="Porcentaje")
fig.show()
Con el gráfico anterior, es posible determinar que la probabilidad de obtener una observación aleatoria, de un cliente con dificultades de pago es del 8.07%.¶

Selección de threshold por filas y columnas para eliminar valores missing.¶

A continuación se presenta la función nulos_columna( ), la cual otorga la cantidad de nulos por columna y su porcentaje de estos respectivamente.¶

In [8]:
f_aux.nulos_columna(df)
Out[8]:
nulos_columnas porcentaje_columnas
COMMONAREA_AVG 214865 69.872297
COMMONAREA_MODE 214865 69.872297
COMMONAREA_MEDI 214865 69.872297
NONLIVINGAPARTMENTS_AVG 213514 69.432963
NONLIVINGAPARTMENTS_MODE 213514 69.432963
NONLIVINGAPARTMENTS_MEDI 213514 69.432963
FONDKAPREMONT_MODE 210295 68.386172
LIVINGAPARTMENTS_MEDI 210199 68.354953
LIVINGAPARTMENTS_AVG 210199 68.354953
LIVINGAPARTMENTS_MODE 210199 68.354953
FLOORSMIN_AVG 208642 67.848630
FLOORSMIN_MODE 208642 67.848630
FLOORSMIN_MEDI 208642 67.848630
YEARS_BUILD_AVG 204488 66.497784
YEARS_BUILD_MEDI 204488 66.497784
YEARS_BUILD_MODE 204488 66.497784
OWN_CAR_AGE 202929 65.990810
LANDAREA_MEDI 182590 59.376738
LANDAREA_AVG 182590 59.376738
LANDAREA_MODE 182590 59.376738
BASEMENTAREA_MEDI 179943 58.515956
BASEMENTAREA_AVG 179943 58.515956
BASEMENTAREA_MODE 179943 58.515956
EXT_SOURCE_1 173378 56.381073
NONLIVINGAREA_AVG 169682 55.179164
NONLIVINGAREA_MODE 169682 55.179164
NONLIVINGAREA_MEDI 169682 55.179164
ELEVATORS_MODE 163891 53.295980
ELEVATORS_AVG 163891 53.295980
ELEVATORS_MEDI 163891 53.295980
WALLSMATERIAL_MODE 156341 50.840783
APARTMENTS_AVG 156061 50.749729
APARTMENTS_MEDI 156061 50.749729
APARTMENTS_MODE 156061 50.749729
ENTRANCES_AVG 154828 50.348768
ENTRANCES_MEDI 154828 50.348768
ENTRANCES_MODE 154828 50.348768
LIVINGAREA_MEDI 154350 50.193326
LIVINGAREA_MODE 154350 50.193326
LIVINGAREA_AVG 154350 50.193326
HOUSETYPE_MODE 154297 50.176091
FLOORSMAX_MEDI 153020 49.760822
FLOORSMAX_MODE 153020 49.760822
FLOORSMAX_AVG 153020 49.760822
YEARS_BEGINEXPLUATATION_MEDI 150007 48.781019
YEARS_BEGINEXPLUATATION_MODE 150007 48.781019
YEARS_BEGINEXPLUATATION_AVG 150007 48.781019
TOTALAREA_MODE 148431 48.268517
EMERGENCYSTATE_MODE 145755 47.398304
OCCUPATION_TYPE 96391 31.345545
EXT_SOURCE_3 60965 19.825307
AMT_REQ_CREDIT_BUREAU_WEEK 41519 13.501631
AMT_REQ_CREDIT_BUREAU_HOUR 41519 13.501631
AMT_REQ_CREDIT_BUREAU_MON 41519 13.501631
AMT_REQ_CREDIT_BUREAU_QRT 41519 13.501631
AMT_REQ_CREDIT_BUREAU_DAY 41519 13.501631
AMT_REQ_CREDIT_BUREAU_YEAR 41519 13.501631
NAME_TYPE_SUITE 1292 0.420148
DEF_30_CNT_SOCIAL_CIRCLE 1021 0.332021
OBS_60_CNT_SOCIAL_CIRCLE 1021 0.332021
OBS_30_CNT_SOCIAL_CIRCLE 1021 0.332021
DEF_60_CNT_SOCIAL_CIRCLE 1021 0.332021
EXT_SOURCE_2 660 0.214626
AMT_GOODS_PRICE 278 0.090403
AMT_ANNUITY 12 0.003902
CNT_FAM_MEMBERS 2 0.000650
DAYS_LAST_PHONE_CHANGE 1 0.000325
AMT_INCOME_TOTAL 0 0.000000
FLAG_DOCUMENT_8 0 0.000000
CODE_GENDER 0 0.000000
FLAG_OWN_CAR 0 0.000000
FLAG_OWN_REALTY 0 0.000000
FLAG_DOCUMENT_2 0 0.000000
FLAG_DOCUMENT_3 0 0.000000
FLAG_DOCUMENT_4 0 0.000000
FLAG_DOCUMENT_5 0 0.000000
FLAG_DOCUMENT_6 0 0.000000
FLAG_DOCUMENT_7 0 0.000000
FLAG_DOCUMENT_9 0 0.000000
FLAG_DOCUMENT_21 0 0.000000
FLAG_DOCUMENT_10 0 0.000000
FLAG_DOCUMENT_11 0 0.000000
CNT_CHILDREN 0 0.000000
FLAG_DOCUMENT_13 0 0.000000
FLAG_DOCUMENT_14 0 0.000000
FLAG_DOCUMENT_15 0 0.000000
FLAG_DOCUMENT_16 0 0.000000
FLAG_DOCUMENT_17 0 0.000000
FLAG_DOCUMENT_18 0 0.000000
FLAG_DOCUMENT_19 0 0.000000
FLAG_DOCUMENT_20 0 0.000000
FLAG_DOCUMENT_12 0 0.000000
AMT_CREDIT 0 0.000000
ORGANIZATION_TYPE 0 0.000000
NAME_INCOME_TYPE 0 0.000000
LIVE_CITY_NOT_WORK_CITY 0 0.000000
NAME_CONTRACT_TYPE 0 0.000000
REG_CITY_NOT_WORK_CITY 0 0.000000
REG_CITY_NOT_LIVE_CITY 0 0.000000
LIVE_REGION_NOT_WORK_REGION 0 0.000000
REG_REGION_NOT_WORK_REGION 0 0.000000
REG_REGION_NOT_LIVE_REGION 0 0.000000
HOUR_APPR_PROCESS_START 0 0.000000
WEEKDAY_APPR_PROCESS_START 0 0.000000
REGION_RATING_CLIENT_W_CITY 0 0.000000
REGION_RATING_CLIENT 0 0.000000
FLAG_EMAIL 0 0.000000
FLAG_PHONE 0 0.000000
FLAG_CONT_MOBILE 0 0.000000
FLAG_WORK_PHONE 0 0.000000
FLAG_EMP_PHONE 0 0.000000
FLAG_MOBIL 0 0.000000
DAYS_ID_PUBLISH 0 0.000000
DAYS_REGISTRATION 0 0.000000
DAYS_EMPLOYED 0 0.000000
DAYS_BIRTH 0 0.000000
REGION_POPULATION_RELATIVE 0 0.000000
NAME_HOUSING_TYPE 0 0.000000
NAME_FAMILY_STATUS 0 0.000000
NAME_EDUCATION_TYPE 0 0.000000
TARGET 0 0.000000
Basandonos en la proporción de valores nulos que contienen las variables o bien el threshold, se opto por mantener las variables, debido a que consideramos necesario profundizar más la relevancia de estas, por medio de ver su desempeño con el modelo.¶

Preprocesamiento inicial de algunas variables¶

En este caso, se transforma una variable categórica en formato de texto a una representación numérica, con el objetivo de facilitar su procesamiento y hacer su manejo más eficiente en el análisis.¶
In [9]:
dia = { "MONDAY": 1, "TUESDAY": 2, "WEDNESDAY": 3, "THURSDAY": 4, "FRIDAY": 5, "SATURDAY": 6, "SUNDAY": 7}

df['NWEEKDAY_PROCESS_START'] = df['WEEKDAY_APPR_PROCESS_START'].replace(dia)
/var/folders/bj/jtm72vws3zncs7grnzmn95yh0000gn/T/ipykernel_12087/367575524.py:3: FutureWarning:

Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`

In [10]:
# Agreamos la nueva variable generada al DataFrame y posteriormente rectificamos la creación de esta.
nueva_columna = 'NWEEKDAY_PROCESS_START' in df.columns
nueva_columna
Out[10]:
True
In [11]:
# Eliminamos la columna preprocesada y rectificamos la eliminación de la variable del DataFrame.
df.drop("WEEKDAY_APPR_PROCESS_START", axis=1, inplace=True)
columna_existe = 'WEEKDAY_APPR_PROCESS_START' in df.columns
columna_existe
Out[11]:
False

Bajo la premisa de que existen variables booleanas representadas como cadenas de texto, tales como 'YES', 'NO', 'y', 'n', o cualquier otra variación. Se define la funcion valores_booleanos( ), la cual reemplaza estos valores por 1 en los casos de afirmación y por 0 en los casos de negación.¶

In [12]:
f_aux.valores_booleanos(df)
/Users/miguelflores/Desktop/P1/practica1/funciones/funciones_auxiliares.py:82: FutureWarning:

Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`

Out[12]:
TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY AMT_GOODS_PRICE NAME_TYPE_SUITE NAME_INCOME_TYPE NAME_EDUCATION_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE REGION_POPULATION_RELATIVE DAYS_BIRTH DAYS_EMPLOYED DAYS_REGISTRATION DAYS_ID_PUBLISH OWN_CAR_AGE FLAG_MOBIL FLAG_EMP_PHONE FLAG_WORK_PHONE FLAG_CONT_MOBILE FLAG_PHONE FLAG_EMAIL OCCUPATION_TYPE CNT_FAM_MEMBERS REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY HOUR_APPR_PROCESS_START REG_REGION_NOT_LIVE_REGION REG_REGION_NOT_WORK_REGION LIVE_REGION_NOT_WORK_REGION REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY LIVE_CITY_NOT_WORK_CITY ORGANIZATION_TYPE EXT_SOURCE_1 EXT_SOURCE_2 EXT_SOURCE_3 APARTMENTS_AVG BASEMENTAREA_AVG YEARS_BEGINEXPLUATATION_AVG YEARS_BUILD_AVG COMMONAREA_AVG ELEVATORS_AVG ENTRANCES_AVG FLOORSMAX_AVG FLOORSMIN_AVG LANDAREA_AVG LIVINGAPARTMENTS_AVG LIVINGAREA_AVG NONLIVINGAPARTMENTS_AVG NONLIVINGAREA_AVG APARTMENTS_MODE BASEMENTAREA_MODE YEARS_BEGINEXPLUATATION_MODE YEARS_BUILD_MODE COMMONAREA_MODE ELEVATORS_MODE ENTRANCES_MODE FLOORSMAX_MODE FLOORSMIN_MODE LANDAREA_MODE LIVINGAPARTMENTS_MODE LIVINGAREA_MODE NONLIVINGAPARTMENTS_MODE NONLIVINGAREA_MODE APARTMENTS_MEDI BASEMENTAREA_MEDI YEARS_BEGINEXPLUATATION_MEDI YEARS_BUILD_MEDI COMMONAREA_MEDI ELEVATORS_MEDI ENTRANCES_MEDI FLOORSMAX_MEDI FLOORSMIN_MEDI LANDAREA_MEDI LIVINGAPARTMENTS_MEDI LIVINGAREA_MEDI NONLIVINGAPARTMENTS_MEDI NONLIVINGAREA_MEDI FONDKAPREMONT_MODE HOUSETYPE_MODE TOTALAREA_MODE WALLSMATERIAL_MODE EMERGENCYSTATE_MODE OBS_30_CNT_SOCIAL_CIRCLE DEF_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE DAYS_LAST_PHONE_CHANGE FLAG_DOCUMENT_2 FLAG_DOCUMENT_3 FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_6 FLAG_DOCUMENT_7 FLAG_DOCUMENT_8 FLAG_DOCUMENT_9 FLAG_DOCUMENT_10 FLAG_DOCUMENT_11 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_14 FLAG_DOCUMENT_15 FLAG_DOCUMENT_16 FLAG_DOCUMENT_17 FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR NWEEKDAY_PROCESS_START
SK_ID_CURR
100002 1 Cash loans M 0 1 0 202500.0 406597.5 24700.5 351000.0 Unaccompanied Working Secondary / secondary special Single / not married House / apartment 0.018801 -9461 -637 -3648.0 -2120 NaN 1 1 0 1 1 0 Laborers 1.0 2 2 10 0 0 0 0 0 0 Business Entity Type 3 0.083037 0.262949 0.139376 0.0247 0.0369 0.9722 0.6192 0.0143 0.00 0.0690 0.0833 0.1250 0.0369 0.0202 0.0190 0.0000 0.0000 0.0252 0.0383 0.9722 0.6341 0.0144 0.0000 0.0690 0.0833 0.1250 0.0377 0.0220 0.0198 0.0 0.0000 0.0250 0.0369 0.9722 0.6243 0.0144 0.00 0.0690 0.0833 0.1250 0.0375 0.0205 0.0193 0.0000 0.0000 reg oper account block of flats 0.0149 Stone, brick No 2.0 2.0 2.0 2.0 -1134.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 1.0 3
100003 0 Cash loans F 0 0 0 270000.0 1293502.5 35698.5 1129500.0 Family State servant Higher education Married House / apartment 0.003541 -16765 -1188 -1186.0 -291 NaN 1 1 0 1 1 0 Core staff 2.0 1 1 11 0 0 0 0 0 0 School 0.311267 0.622246 NaN 0.0959 0.0529 0.9851 0.7960 0.0605 0.08 0.0345 0.2917 0.3333 0.0130 0.0773 0.0549 0.0039 0.0098 0.0924 0.0538 0.9851 0.8040 0.0497 0.0806 0.0345 0.2917 0.3333 0.0128 0.0790 0.0554 0.0 0.0000 0.0968 0.0529 0.9851 0.7987 0.0608 0.08 0.0345 0.2917 0.3333 0.0132 0.0787 0.0558 0.0039 0.0100 reg oper account block of flats 0.0714 Block No 1.0 0.0 1.0 0.0 -828.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 1
100004 0 Revolving loans M 1 1 0 67500.0 135000.0 6750.0 135000.0 Unaccompanied Working Secondary / secondary special Single / not married House / apartment 0.010032 -19046 -225 -4260.0 -2531 26.0 1 1 1 1 1 0 Laborers 1.0 2 2 9 0 0 0 0 0 0 Government NaN 0.555912 0.729567 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 0.0 0.0 0.0 -815.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 1
100006 0 Cash loans F 0 1 0 135000.0 312682.5 29686.5 297000.0 Unaccompanied Working Secondary / secondary special Civil marriage House / apartment 0.008019 -19005 -3039 -9833.0 -2437 NaN 1 1 0 1 0 0 Laborers 2.0 2 2 17 0 0 0 0 0 0 Business Entity Type 3 NaN 0.650442 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 0.0 2.0 0.0 -617.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN NaN NaN 3
100007 0 Cash loans M 0 1 0 121500.0 513000.0 21865.5 513000.0 Unaccompanied Working Secondary / secondary special Single / not married House / apartment 0.028663 -19932 -3038 -4311.0 -3458 NaN 1 1 0 1 0 0 Core staff 1.0 2 2 11 0 0 0 0 1 1 Religion NaN 0.322738 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 0.0 0.0 0.0 -1106.0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
456251 0 Cash loans M 0 0 0 157500.0 254700.0 27558.0 225000.0 Unaccompanied Working Secondary / secondary special Separated With parents 0.032561 -9327 -236 -8456.0 -1982 NaN 1 1 0 1 0 0 Sales staff 1.0 1 1 15 0 0 0 0 0 0 Services 0.145570 0.681632 NaN 0.2021 0.0887 0.9876 0.8300 0.0202 0.22 0.1034 0.6042 0.2708 0.0594 0.1484 0.1965 0.0753 0.1095 0.1008 0.0172 0.9782 0.7125 0.0172 0.0806 0.0345 0.4583 0.0417 0.0094 0.0882 0.0853 0.0 0.0125 0.2040 0.0887 0.9876 0.8323 0.0203 0.22 0.1034 0.6042 0.2708 0.0605 0.1509 0.2001 0.0757 0.1118 reg oper account block of flats 0.2898 Stone, brick No 0.0 0.0 0.0 0.0 -273.0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN NaN NaN 4
456252 0 Cash loans F 0 1 0 72000.0 269550.0 12001.5 225000.0 Unaccompanied Pensioner Secondary / secondary special Widow House / apartment 0.025164 -20775 365243 -4388.0 -4090 NaN 1 0 0 1 1 0 NaN 1.0 2 2 8 0 0 0 0 0 0 XNA NaN 0.115992 NaN 0.0247 0.0435 0.9727 0.6260 0.0022 0.00 0.1034 0.0833 0.1250 0.0579 0.0202 0.0257 0.0000 0.0000 0.0252 0.0451 0.9727 0.6406 0.0022 0.0000 0.1034 0.0833 0.1250 0.0592 0.0220 0.0267 0.0 0.0000 0.0250 0.0435 0.9727 0.6310 0.0022 0.00 0.1034 0.0833 0.1250 0.0589 0.0205 0.0261 0.0000 0.0000 reg oper account block of flats 0.0214 Stone, brick No 0.0 0.0 0.0 0.0 0.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN NaN NaN 1
456253 0 Cash loans F 0 1 0 153000.0 677664.0 29979.0 585000.0 Unaccompanied Working Higher education Separated House / apartment 0.005002 -14966 -7921 -6737.0 -5150 NaN 1 1 0 1 0 1 Managers 1.0 3 3 9 0 0 0 0 1 1 School 0.744026 0.535722 0.218859 0.1031 0.0862 0.9816 0.7484 0.0123 0.00 0.2069 0.1667 0.2083 NaN 0.0841 0.9279 0.0000 0.0000 0.1050 0.0894 0.9816 0.7583 0.0124 0.0000 0.2069 0.1667 0.2083 NaN 0.0918 0.9667 0.0 0.0000 0.1041 0.0862 0.9816 0.7518 0.0124 0.00 0.2069 0.1667 0.2083 NaN 0.0855 0.9445 0.0000 0.0000 reg oper account block of flats 0.7970 Panel No 6.0 0.0 6.0 0.0 -1909.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0 0.0 0.0 1.0 0.0 1.0 4
456254 1 Cash loans F 0 1 0 171000.0 370107.0 20205.0 319500.0 Unaccompanied Commercial associate Secondary / secondary special Married House / apartment 0.005313 -11961 -4786 -2562.0 -931 NaN 1 1 0 1 0 0 Laborers 2.0 2 2 9 0 0 0 1 1 0 Business Entity Type 1 NaN 0.514163 0.661024 0.0124 NaN 0.9771 NaN NaN NaN 0.0690 0.0417 NaN NaN NaN 0.0061 NaN NaN 0.0126 NaN 0.9772 NaN NaN NaN 0.0690 0.0417 NaN NaN NaN 0.0063 NaN NaN 0.0125 NaN 0.9771 NaN NaN NaN 0.0690 0.0417 NaN NaN NaN 0.0062 NaN NaN NaN block of flats 0.0086 Stone, brick No 0.0 0.0 0.0 0.0 -322.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 3
456255 0 Cash loans F 0 0 0 157500.0 675000.0 49117.5 675000.0 Unaccompanied Commercial associate Higher education Married House / apartment 0.046220 -16856 -1262 -5128.0 -410 NaN 1 1 1 1 1 0 Laborers 2.0 1 1 20 0 0 0 0 1 1 Business Entity Type 3 0.734460 0.708569 0.113922 0.0742 0.0526 0.9881 NaN 0.0176 0.08 0.0690 0.3750 NaN NaN NaN 0.0791 NaN 0.0000 0.0756 0.0546 0.9881 NaN 0.0178 0.0806 0.0690 0.3750 NaN NaN NaN 0.0824 NaN 0.0000 0.0749 0.0526 0.9881 NaN 0.0177 0.08 0.0690 0.3750 NaN NaN NaN 0.0805 NaN 0.0000 NaN block of flats 0.0718 Panel No 0.0 0.0 0.0 0.0 -787.0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 2.0 0.0 1.0 4

307511 rows × 121 columns

Tipos de variables: Categoricas, Continuas y Booleanas¶

Por medio de la funcion clasificar_variables ( ), definimos los tipos de variables, añadiendo una lista de variables no clasificadas para las variables que no entraron dentro de las demás secciones.¶

In [13]:
f_aux.clasificar_variables(df)
Variables Booleanas: 36 ['TARGET', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE', 'FLAG_CONT_MOBILE', 'FLAG_PHONE', 'FLAG_EMAIL', 'REG_REGION_NOT_LIVE_REGION', 'REG_REGION_NOT_WORK_REGION', 'LIVE_REGION_NOT_WORK_REGION', 'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY', 'LIVE_CITY_NOT_WORK_CITY', 'EMERGENCYSTATE_MODE', 'FLAG_DOCUMENT_2', 'FLAG_DOCUMENT_3', 'FLAG_DOCUMENT_4', 'FLAG_DOCUMENT_5', 'FLAG_DOCUMENT_6', 'FLAG_DOCUMENT_7', 'FLAG_DOCUMENT_8', 'FLAG_DOCUMENT_9', 'FLAG_DOCUMENT_10', 'FLAG_DOCUMENT_11', 'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13', 'FLAG_DOCUMENT_14', 'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_16', 'FLAG_DOCUMENT_17', 'FLAG_DOCUMENT_18', 'FLAG_DOCUMENT_19', 'FLAG_DOCUMENT_20', 'FLAG_DOCUMENT_21']
============================================================================================================================================================================
Variables Categóricas: 14 ['NAME_CONTRACT_TYPE', 'CODE_GENDER', 'NAME_TYPE_SUITE', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE', 'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'OCCUPATION_TYPE', 'REGION_RATING_CLIENT', 'REGION_RATING_CLIENT_W_CITY', 'ORGANIZATION_TYPE', 'FONDKAPREMONT_MODE', 'HOUSETYPE_MODE', 'WALLSMATERIAL_MODE']
============================================================================================================================================================================
Variables Continuas: 65 ['AMT_INCOME_TOTAL', 'AMT_CREDIT', 'AMT_ANNUITY', 'AMT_GOODS_PRICE', 'REGION_POPULATION_RELATIVE', 'DAYS_REGISTRATION', 'OWN_CAR_AGE', 'CNT_FAM_MEMBERS', 'EXT_SOURCE_1', 'EXT_SOURCE_2', 'EXT_SOURCE_3', 'APARTMENTS_AVG', 'BASEMENTAREA_AVG', 'YEARS_BEGINEXPLUATATION_AVG', 'YEARS_BUILD_AVG', 'COMMONAREA_AVG', 'ELEVATORS_AVG', 'ENTRANCES_AVG', 'FLOORSMAX_AVG', 'FLOORSMIN_AVG', 'LANDAREA_AVG', 'LIVINGAPARTMENTS_AVG', 'LIVINGAREA_AVG', 'NONLIVINGAPARTMENTS_AVG', 'NONLIVINGAREA_AVG', 'APARTMENTS_MODE', 'BASEMENTAREA_MODE', 'YEARS_BEGINEXPLUATATION_MODE', 'YEARS_BUILD_MODE', 'COMMONAREA_MODE', 'ELEVATORS_MODE', 'ENTRANCES_MODE', 'FLOORSMAX_MODE', 'FLOORSMIN_MODE', 'LANDAREA_MODE', 'LIVINGAPARTMENTS_MODE', 'LIVINGAREA_MODE', 'NONLIVINGAPARTMENTS_MODE', 'NONLIVINGAREA_MODE', 'APARTMENTS_MEDI', 'BASEMENTAREA_MEDI', 'YEARS_BEGINEXPLUATATION_MEDI', 'YEARS_BUILD_MEDI', 'COMMONAREA_MEDI', 'ELEVATORS_MEDI', 'ENTRANCES_MEDI', 'FLOORSMAX_MEDI', 'FLOORSMIN_MEDI', 'LANDAREA_MEDI', 'LIVINGAPARTMENTS_MEDI', 'LIVINGAREA_MEDI', 'NONLIVINGAPARTMENTS_MEDI', 'NONLIVINGAREA_MEDI', 'TOTALAREA_MODE', 'OBS_30_CNT_SOCIAL_CIRCLE', 'DEF_30_CNT_SOCIAL_CIRCLE', 'OBS_60_CNT_SOCIAL_CIRCLE', 'DEF_60_CNT_SOCIAL_CIRCLE', 'DAYS_LAST_PHONE_CHANGE', 'AMT_REQ_CREDIT_BUREAU_HOUR', 'AMT_REQ_CREDIT_BUREAU_DAY', 'AMT_REQ_CREDIT_BUREAU_WEEK', 'AMT_REQ_CREDIT_BUREAU_MON', 'AMT_REQ_CREDIT_BUREAU_QRT', 'AMT_REQ_CREDIT_BUREAU_YEAR']
============================================================================================================================================================================
Variables no clasificadas: 6 ['CNT_CHILDREN', 'DAYS_BIRTH', 'DAYS_EMPLOYED', 'DAYS_ID_PUBLISH', 'HOUR_APPR_PROCESS_START', 'NWEEKDAY_PROCESS_START']
Out[13]:
(['TARGET',
  'FLAG_OWN_CAR',
  'FLAG_OWN_REALTY',
  'FLAG_MOBIL',
  'FLAG_EMP_PHONE',
  'FLAG_WORK_PHONE',
  'FLAG_CONT_MOBILE',
  'FLAG_PHONE',
  'FLAG_EMAIL',
  'REG_REGION_NOT_LIVE_REGION',
  'REG_REGION_NOT_WORK_REGION',
  'LIVE_REGION_NOT_WORK_REGION',
  'REG_CITY_NOT_LIVE_CITY',
  'REG_CITY_NOT_WORK_CITY',
  'LIVE_CITY_NOT_WORK_CITY',
  'EMERGENCYSTATE_MODE',
  'FLAG_DOCUMENT_2',
  'FLAG_DOCUMENT_3',
  'FLAG_DOCUMENT_4',
  'FLAG_DOCUMENT_5',
  'FLAG_DOCUMENT_6',
  'FLAG_DOCUMENT_7',
  'FLAG_DOCUMENT_8',
  'FLAG_DOCUMENT_9',
  'FLAG_DOCUMENT_10',
  'FLAG_DOCUMENT_11',
  'FLAG_DOCUMENT_12',
  'FLAG_DOCUMENT_13',
  'FLAG_DOCUMENT_14',
  'FLAG_DOCUMENT_15',
  'FLAG_DOCUMENT_16',
  'FLAG_DOCUMENT_17',
  'FLAG_DOCUMENT_18',
  'FLAG_DOCUMENT_19',
  'FLAG_DOCUMENT_20',
  'FLAG_DOCUMENT_21'],
 ['NAME_CONTRACT_TYPE',
  'CODE_GENDER',
  'NAME_TYPE_SUITE',
  'NAME_INCOME_TYPE',
  'NAME_EDUCATION_TYPE',
  'NAME_FAMILY_STATUS',
  'NAME_HOUSING_TYPE',
  'OCCUPATION_TYPE',
  'REGION_RATING_CLIENT',
  'REGION_RATING_CLIENT_W_CITY',
  'ORGANIZATION_TYPE',
  'FONDKAPREMONT_MODE',
  'HOUSETYPE_MODE',
  'WALLSMATERIAL_MODE'],
 ['AMT_INCOME_TOTAL',
  'AMT_CREDIT',
  'AMT_ANNUITY',
  'AMT_GOODS_PRICE',
  'REGION_POPULATION_RELATIVE',
  'DAYS_REGISTRATION',
  'OWN_CAR_AGE',
  'CNT_FAM_MEMBERS',
  'EXT_SOURCE_1',
  'EXT_SOURCE_2',
  'EXT_SOURCE_3',
  'APARTMENTS_AVG',
  'BASEMENTAREA_AVG',
  'YEARS_BEGINEXPLUATATION_AVG',
  'YEARS_BUILD_AVG',
  'COMMONAREA_AVG',
  'ELEVATORS_AVG',
  'ENTRANCES_AVG',
  'FLOORSMAX_AVG',
  'FLOORSMIN_AVG',
  'LANDAREA_AVG',
  'LIVINGAPARTMENTS_AVG',
  'LIVINGAREA_AVG',
  'NONLIVINGAPARTMENTS_AVG',
  'NONLIVINGAREA_AVG',
  'APARTMENTS_MODE',
  'BASEMENTAREA_MODE',
  'YEARS_BEGINEXPLUATATION_MODE',
  'YEARS_BUILD_MODE',
  'COMMONAREA_MODE',
  'ELEVATORS_MODE',
  'ENTRANCES_MODE',
  'FLOORSMAX_MODE',
  'FLOORSMIN_MODE',
  'LANDAREA_MODE',
  'LIVINGAPARTMENTS_MODE',
  'LIVINGAREA_MODE',
  'NONLIVINGAPARTMENTS_MODE',
  'NONLIVINGAREA_MODE',
  'APARTMENTS_MEDI',
  'BASEMENTAREA_MEDI',
  'YEARS_BEGINEXPLUATATION_MEDI',
  'YEARS_BUILD_MEDI',
  'COMMONAREA_MEDI',
  'ELEVATORS_MEDI',
  'ENTRANCES_MEDI',
  'FLOORSMAX_MEDI',
  'FLOORSMIN_MEDI',
  'LANDAREA_MEDI',
  'LIVINGAPARTMENTS_MEDI',
  'LIVINGAREA_MEDI',
  'NONLIVINGAPARTMENTS_MEDI',
  'NONLIVINGAREA_MEDI',
  'TOTALAREA_MODE',
  'OBS_30_CNT_SOCIAL_CIRCLE',
  'DEF_30_CNT_SOCIAL_CIRCLE',
  'OBS_60_CNT_SOCIAL_CIRCLE',
  'DEF_60_CNT_SOCIAL_CIRCLE',
  'DAYS_LAST_PHONE_CHANGE',
  'AMT_REQ_CREDIT_BUREAU_HOUR',
  'AMT_REQ_CREDIT_BUREAU_DAY',
  'AMT_REQ_CREDIT_BUREAU_WEEK',
  'AMT_REQ_CREDIT_BUREAU_MON',
  'AMT_REQ_CREDIT_BUREAU_QRT',
  'AMT_REQ_CREDIT_BUREAU_YEAR'],
 ['CNT_CHILDREN',
  'DAYS_BIRTH',
  'DAYS_EMPLOYED',
  'DAYS_ID_PUBLISH',
  'HOUR_APPR_PROCESS_START',
  'NWEEKDAY_PROCESS_START'])
In [14]:
lista_var_cat, lista_var_con, lista_var_bool, lista_var_no_clasificadas = f_aux.clasificar_variables(df)
Variables Booleanas: 36 ['TARGET', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE', 'FLAG_CONT_MOBILE', 'FLAG_PHONE', 'FLAG_EMAIL', 'REG_REGION_NOT_LIVE_REGION', 'REG_REGION_NOT_WORK_REGION', 'LIVE_REGION_NOT_WORK_REGION', 'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY', 'LIVE_CITY_NOT_WORK_CITY', 'EMERGENCYSTATE_MODE', 'FLAG_DOCUMENT_2', 'FLAG_DOCUMENT_3', 'FLAG_DOCUMENT_4', 'FLAG_DOCUMENT_5', 'FLAG_DOCUMENT_6', 'FLAG_DOCUMENT_7', 'FLAG_DOCUMENT_8', 'FLAG_DOCUMENT_9', 'FLAG_DOCUMENT_10', 'FLAG_DOCUMENT_11', 'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13', 'FLAG_DOCUMENT_14', 'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_16', 'FLAG_DOCUMENT_17', 'FLAG_DOCUMENT_18', 'FLAG_DOCUMENT_19', 'FLAG_DOCUMENT_20', 'FLAG_DOCUMENT_21']
============================================================================================================================================================================
Variables Categóricas: 14 ['NAME_CONTRACT_TYPE', 'CODE_GENDER', 'NAME_TYPE_SUITE', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE', 'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'OCCUPATION_TYPE', 'REGION_RATING_CLIENT', 'REGION_RATING_CLIENT_W_CITY', 'ORGANIZATION_TYPE', 'FONDKAPREMONT_MODE', 'HOUSETYPE_MODE', 'WALLSMATERIAL_MODE']
============================================================================================================================================================================
Variables Continuas: 65 ['AMT_INCOME_TOTAL', 'AMT_CREDIT', 'AMT_ANNUITY', 'AMT_GOODS_PRICE', 'REGION_POPULATION_RELATIVE', 'DAYS_REGISTRATION', 'OWN_CAR_AGE', 'CNT_FAM_MEMBERS', 'EXT_SOURCE_1', 'EXT_SOURCE_2', 'EXT_SOURCE_3', 'APARTMENTS_AVG', 'BASEMENTAREA_AVG', 'YEARS_BEGINEXPLUATATION_AVG', 'YEARS_BUILD_AVG', 'COMMONAREA_AVG', 'ELEVATORS_AVG', 'ENTRANCES_AVG', 'FLOORSMAX_AVG', 'FLOORSMIN_AVG', 'LANDAREA_AVG', 'LIVINGAPARTMENTS_AVG', 'LIVINGAREA_AVG', 'NONLIVINGAPARTMENTS_AVG', 'NONLIVINGAREA_AVG', 'APARTMENTS_MODE', 'BASEMENTAREA_MODE', 'YEARS_BEGINEXPLUATATION_MODE', 'YEARS_BUILD_MODE', 'COMMONAREA_MODE', 'ELEVATORS_MODE', 'ENTRANCES_MODE', 'FLOORSMAX_MODE', 'FLOORSMIN_MODE', 'LANDAREA_MODE', 'LIVINGAPARTMENTS_MODE', 'LIVINGAREA_MODE', 'NONLIVINGAPARTMENTS_MODE', 'NONLIVINGAREA_MODE', 'APARTMENTS_MEDI', 'BASEMENTAREA_MEDI', 'YEARS_BEGINEXPLUATATION_MEDI', 'YEARS_BUILD_MEDI', 'COMMONAREA_MEDI', 'ELEVATORS_MEDI', 'ENTRANCES_MEDI', 'FLOORSMAX_MEDI', 'FLOORSMIN_MEDI', 'LANDAREA_MEDI', 'LIVINGAPARTMENTS_MEDI', 'LIVINGAREA_MEDI', 'NONLIVINGAPARTMENTS_MEDI', 'NONLIVINGAREA_MEDI', 'TOTALAREA_MODE', 'OBS_30_CNT_SOCIAL_CIRCLE', 'DEF_30_CNT_SOCIAL_CIRCLE', 'OBS_60_CNT_SOCIAL_CIRCLE', 'DEF_60_CNT_SOCIAL_CIRCLE', 'DAYS_LAST_PHONE_CHANGE', 'AMT_REQ_CREDIT_BUREAU_HOUR', 'AMT_REQ_CREDIT_BUREAU_DAY', 'AMT_REQ_CREDIT_BUREAU_WEEK', 'AMT_REQ_CREDIT_BUREAU_MON', 'AMT_REQ_CREDIT_BUREAU_QRT', 'AMT_REQ_CREDIT_BUREAU_YEAR']
============================================================================================================================================================================
Variables no clasificadas: 6 ['CNT_CHILDREN', 'DAYS_BIRTH', 'DAYS_EMPLOYED', 'DAYS_ID_PUBLISH', 'HOUR_APPR_PROCESS_START', 'NWEEKDAY_PROCESS_START']

Con base a las variables no clasificadas, generamos la funcion nueva_clasificar_variables( ), donde se presenta un formato similar a la anterior función, sin embargo, se cuenta con un bucle que itera sobre las variables no clasificadas, añadiendoles al tipo de variable determinado.¶

In [15]:
f_aux.nueva_clasificar_variables(df)
Variables Booleanas: 36 ['TARGET', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE', 'FLAG_CONT_MOBILE', 'FLAG_PHONE', 'FLAG_EMAIL', 'REG_REGION_NOT_LIVE_REGION', 'REG_REGION_NOT_WORK_REGION', 'LIVE_REGION_NOT_WORK_REGION', 'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY', 'LIVE_CITY_NOT_WORK_CITY', 'EMERGENCYSTATE_MODE', 'FLAG_DOCUMENT_2', 'FLAG_DOCUMENT_3', 'FLAG_DOCUMENT_4', 'FLAG_DOCUMENT_5', 'FLAG_DOCUMENT_6', 'FLAG_DOCUMENT_7', 'FLAG_DOCUMENT_8', 'FLAG_DOCUMENT_9', 'FLAG_DOCUMENT_10', 'FLAG_DOCUMENT_11', 'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13', 'FLAG_DOCUMENT_14', 'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_16', 'FLAG_DOCUMENT_17', 'FLAG_DOCUMENT_18', 'FLAG_DOCUMENT_19', 'FLAG_DOCUMENT_20', 'FLAG_DOCUMENT_21']
============================================================================================================================================================================
Variables Categóricas: 16 ['NAME_CONTRACT_TYPE', 'CODE_GENDER', 'NAME_TYPE_SUITE', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE', 'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'OCCUPATION_TYPE', 'REGION_RATING_CLIENT', 'REGION_RATING_CLIENT_W_CITY', 'ORGANIZATION_TYPE', 'FONDKAPREMONT_MODE', 'HOUSETYPE_MODE', 'WALLSMATERIAL_MODE', 'CNT_CHILDREN', 'NWEEKDAY_PROCESS_START']
============================================================================================================================================================================
Variables Continuas: 69 ['AMT_INCOME_TOTAL', 'AMT_CREDIT', 'AMT_ANNUITY', 'AMT_GOODS_PRICE', 'REGION_POPULATION_RELATIVE', 'DAYS_REGISTRATION', 'OWN_CAR_AGE', 'CNT_FAM_MEMBERS', 'EXT_SOURCE_1', 'EXT_SOURCE_2', 'EXT_SOURCE_3', 'APARTMENTS_AVG', 'BASEMENTAREA_AVG', 'YEARS_BEGINEXPLUATATION_AVG', 'YEARS_BUILD_AVG', 'COMMONAREA_AVG', 'ELEVATORS_AVG', 'ENTRANCES_AVG', 'FLOORSMAX_AVG', 'FLOORSMIN_AVG', 'LANDAREA_AVG', 'LIVINGAPARTMENTS_AVG', 'LIVINGAREA_AVG', 'NONLIVINGAPARTMENTS_AVG', 'NONLIVINGAREA_AVG', 'APARTMENTS_MODE', 'BASEMENTAREA_MODE', 'YEARS_BEGINEXPLUATATION_MODE', 'YEARS_BUILD_MODE', 'COMMONAREA_MODE', 'ELEVATORS_MODE', 'ENTRANCES_MODE', 'FLOORSMAX_MODE', 'FLOORSMIN_MODE', 'LANDAREA_MODE', 'LIVINGAPARTMENTS_MODE', 'LIVINGAREA_MODE', 'NONLIVINGAPARTMENTS_MODE', 'NONLIVINGAREA_MODE', 'APARTMENTS_MEDI', 'BASEMENTAREA_MEDI', 'YEARS_BEGINEXPLUATATION_MEDI', 'YEARS_BUILD_MEDI', 'COMMONAREA_MEDI', 'ELEVATORS_MEDI', 'ENTRANCES_MEDI', 'FLOORSMAX_MEDI', 'FLOORSMIN_MEDI', 'LANDAREA_MEDI', 'LIVINGAPARTMENTS_MEDI', 'LIVINGAREA_MEDI', 'NONLIVINGAPARTMENTS_MEDI', 'NONLIVINGAREA_MEDI', 'TOTALAREA_MODE', 'OBS_30_CNT_SOCIAL_CIRCLE', 'DEF_30_CNT_SOCIAL_CIRCLE', 'OBS_60_CNT_SOCIAL_CIRCLE', 'DEF_60_CNT_SOCIAL_CIRCLE', 'DAYS_LAST_PHONE_CHANGE', 'AMT_REQ_CREDIT_BUREAU_HOUR', 'AMT_REQ_CREDIT_BUREAU_DAY', 'AMT_REQ_CREDIT_BUREAU_WEEK', 'AMT_REQ_CREDIT_BUREAU_MON', 'AMT_REQ_CREDIT_BUREAU_QRT', 'AMT_REQ_CREDIT_BUREAU_YEAR', 'DAYS_BIRTH', 'DAYS_EMPLOYED', 'DAYS_ID_PUBLISH', 'HOUR_APPR_PROCESS_START']
============================================================================================================================================================================
Variables no clasificadas: 0 []
Out[15]:
(['TARGET',
  'FLAG_OWN_CAR',
  'FLAG_OWN_REALTY',
  'FLAG_MOBIL',
  'FLAG_EMP_PHONE',
  'FLAG_WORK_PHONE',
  'FLAG_CONT_MOBILE',
  'FLAG_PHONE',
  'FLAG_EMAIL',
  'REG_REGION_NOT_LIVE_REGION',
  'REG_REGION_NOT_WORK_REGION',
  'LIVE_REGION_NOT_WORK_REGION',
  'REG_CITY_NOT_LIVE_CITY',
  'REG_CITY_NOT_WORK_CITY',
  'LIVE_CITY_NOT_WORK_CITY',
  'EMERGENCYSTATE_MODE',
  'FLAG_DOCUMENT_2',
  'FLAG_DOCUMENT_3',
  'FLAG_DOCUMENT_4',
  'FLAG_DOCUMENT_5',
  'FLAG_DOCUMENT_6',
  'FLAG_DOCUMENT_7',
  'FLAG_DOCUMENT_8',
  'FLAG_DOCUMENT_9',
  'FLAG_DOCUMENT_10',
  'FLAG_DOCUMENT_11',
  'FLAG_DOCUMENT_12',
  'FLAG_DOCUMENT_13',
  'FLAG_DOCUMENT_14',
  'FLAG_DOCUMENT_15',
  'FLAG_DOCUMENT_16',
  'FLAG_DOCUMENT_17',
  'FLAG_DOCUMENT_18',
  'FLAG_DOCUMENT_19',
  'FLAG_DOCUMENT_20',
  'FLAG_DOCUMENT_21'],
 ['NAME_CONTRACT_TYPE',
  'CODE_GENDER',
  'NAME_TYPE_SUITE',
  'NAME_INCOME_TYPE',
  'NAME_EDUCATION_TYPE',
  'NAME_FAMILY_STATUS',
  'NAME_HOUSING_TYPE',
  'OCCUPATION_TYPE',
  'REGION_RATING_CLIENT',
  'REGION_RATING_CLIENT_W_CITY',
  'ORGANIZATION_TYPE',
  'FONDKAPREMONT_MODE',
  'HOUSETYPE_MODE',
  'WALLSMATERIAL_MODE',
  'CNT_CHILDREN',
  'NWEEKDAY_PROCESS_START'],
 ['AMT_INCOME_TOTAL',
  'AMT_CREDIT',
  'AMT_ANNUITY',
  'AMT_GOODS_PRICE',
  'REGION_POPULATION_RELATIVE',
  'DAYS_REGISTRATION',
  'OWN_CAR_AGE',
  'CNT_FAM_MEMBERS',
  'EXT_SOURCE_1',
  'EXT_SOURCE_2',
  'EXT_SOURCE_3',
  'APARTMENTS_AVG',
  'BASEMENTAREA_AVG',
  'YEARS_BEGINEXPLUATATION_AVG',
  'YEARS_BUILD_AVG',
  'COMMONAREA_AVG',
  'ELEVATORS_AVG',
  'ENTRANCES_AVG',
  'FLOORSMAX_AVG',
  'FLOORSMIN_AVG',
  'LANDAREA_AVG',
  'LIVINGAPARTMENTS_AVG',
  'LIVINGAREA_AVG',
  'NONLIVINGAPARTMENTS_AVG',
  'NONLIVINGAREA_AVG',
  'APARTMENTS_MODE',
  'BASEMENTAREA_MODE',
  'YEARS_BEGINEXPLUATATION_MODE',
  'YEARS_BUILD_MODE',
  'COMMONAREA_MODE',
  'ELEVATORS_MODE',
  'ENTRANCES_MODE',
  'FLOORSMAX_MODE',
  'FLOORSMIN_MODE',
  'LANDAREA_MODE',
  'LIVINGAPARTMENTS_MODE',
  'LIVINGAREA_MODE',
  'NONLIVINGAPARTMENTS_MODE',
  'NONLIVINGAREA_MODE',
  'APARTMENTS_MEDI',
  'BASEMENTAREA_MEDI',
  'YEARS_BEGINEXPLUATATION_MEDI',
  'YEARS_BUILD_MEDI',
  'COMMONAREA_MEDI',
  'ELEVATORS_MEDI',
  'ENTRANCES_MEDI',
  'FLOORSMAX_MEDI',
  'FLOORSMIN_MEDI',
  'LANDAREA_MEDI',
  'LIVINGAPARTMENTS_MEDI',
  'LIVINGAREA_MEDI',
  'NONLIVINGAPARTMENTS_MEDI',
  'NONLIVINGAREA_MEDI',
  'TOTALAREA_MODE',
  'OBS_30_CNT_SOCIAL_CIRCLE',
  'DEF_30_CNT_SOCIAL_CIRCLE',
  'OBS_60_CNT_SOCIAL_CIRCLE',
  'DEF_60_CNT_SOCIAL_CIRCLE',
  'DAYS_LAST_PHONE_CHANGE',
  'AMT_REQ_CREDIT_BUREAU_HOUR',
  'AMT_REQ_CREDIT_BUREAU_DAY',
  'AMT_REQ_CREDIT_BUREAU_WEEK',
  'AMT_REQ_CREDIT_BUREAU_MON',
  'AMT_REQ_CREDIT_BUREAU_QRT',
  'AMT_REQ_CREDIT_BUREAU_YEAR',
  'DAYS_BIRTH',
  'DAYS_EMPLOYED',
  'DAYS_ID_PUBLISH',
  'HOUR_APPR_PROCESS_START'],
 [])

Guardar el nuevo CSV¶

In [16]:
df.to_csv('/Users/miguelflores/Desktop/CSV/pd_data_initial_preprocessing.csv')